Model Selection

Image to Text

# Image to Text

Vit Gpt2 Image Captioning

This is an image captioning model based on ViT and GPT2 architectures, capable of generating natural language descriptions for input images.

Bpe Vocab N OCR

Bpe-vocab-n-OCR is an advanced text extraction tool based on OCR, optimized for generating structured and tokenized output.

Transformers Supports Multiple Languages

BLIP Radiology Model

An image-to-text model based on the transformers library, supporting the conversion of image content into descriptive text.

OCR TextInput Base

A specialized image-to-text model for the financial domain, supporting English text recognition, primarily used for processing image content in financial documents.

Text Recognition

Transformers English

Trocr Base Finetune Numbers

TrOCR is a Transformer-based optical character recognition model designed to extract text content from images.

Transformers English

This model is a fine-tuned version of Microsoft's TrOCR printed text model, specifically designed for Sinhala OCR recognition tasks.

Text Recognition

Transformers Other

An optical character recognition model based on Hugging Face Transformers, specifically designed for recognizing MNIST-style digit images

Text Recognition

Transformers English

Trocr Base Printed Captcha Ocr

A captcha recognition model fine-tuned based on Microsoft's trocr-base-printed model, specifically designed for OCR tasks involving printed text

Text Recognition

Image Caption Using ViT GPT2

This is an image captioning model based on Vision Transformer (ViT) and GPT2 architectures, capable of generating natural language descriptions for input images.

Trocr Base Fa V2

This is a Transformer-based OCR model specifically designed for recognizing Persian text in images.

Text Recognition Other

Optical Character Recognition model specialized for Japanese text in manga

Text Recognition

Transformers Japanese

Donut Base Sroie

A model fine-tuned on an image folder dataset based on naver-clova-ix/donut-base, with no specific use case explicitly stated

Text Recognition

An OCR model for Hebrew image-to-text conversion, capable of recognizing Hebrew text in images.

Text Recognition

Transformers Other

Pix2struct Docvqa Base

Pix2Struct is an image encoder-text decoder model trained on image-text pairs, supporting various tasks including image captioning and visual question answering.

Transformers Supports Multiple Languages

Donut Base Sroie

This model is a fine-tuned version of naver-clova-ix/donut-base on an image folder dataset, suitable for document understanding tasks.

Text Recognition

Ko Trocr Base Nsmc News Chatbot

This is a proof-of-concept model for Korean text recognition, trained on the TrOCR architecture, supporting Korean text extraction from images.

Transformers Korean

Donut Base Sroie

A document understanding model fine-tuned based on philschmid/donut-base-sroie

Text Recognition

Donut Base Medical Handwritten Prescriptions Information Extraction

A fine-tuned Donut model for extracting text information from handwritten medical prescription images.

Donut Base Sroie

A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image text extraction tasks

Text Recognition

Trocr Base Printed

A branch model based on microsoft/trocr-base-printed, specializing in OCR tasks for printed text.

Text Recognition

Doctr Torch Crnn Mobilenet V3 Large French

An optical character recognition (OCR) model based on TensorFlow 2 and PyTorch, supporting multilingual text detection and recognition

Text Recognition

Transformers Supports Multiple Languages

Vit Gpt2 Image Captioning

This is an image captioning model based on ViT and GPT2 architectures, capable of generating natural language descriptions for input images.

Trocr Base Stage1

TrOCR is a Transformer-based pretrained optical character recognition model developed by Microsoft, suitable for single-line text image OCR tasks.

This is an image-to-text generation model capable of receiving images and outputting descriptive text.

Transformers English

Trocr Small Stage1

TrOCR is a Transformer-based pre-trained optical character recognition model that adopts an encoder-decoder architecture, suitable for OCR tasks on single-line text images.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase